Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-26412852

RESUMO

Homeobox genes are a group of genes coding for transcription factors with a DNA-binding helix-turn-helix structure called a homeodomain and which play a crucial role in pattern formation during embryogenesis. Many homeobox genes are located in clusters and some of these, most notably the HOX genes, are known to have antisense or opposite strand long non-coding RNA (lncRNA) genes that play a regulatory role. Because automated annotation of both gene clusters and non-coding genes is fraught with difficulty (over-prediction, under-prediction, inaccurate transcript structures), we set out to manually annotate all homeobox genes in the mouse and human genomes. This includes all supported splice variants, pseudogenes and both antisense and flanking lncRNAs. One of the areas where manual annotation has a significant advantage is the annotation of duplicated gene clusters. After comprehensive annotation of all homeobox genes and their antisense genes in human and in mouse, we found some discrepancies with the current gene set in RefSeq regarding exact gene structures and coding versus pseudogene locus biotype. We also identified previously un-annotated pseudogenes in the DUX, Rhox and Obox gene clusters, which helped us re-evaluate and update the gene nomenclature in these regions. We found that human homeobox genes are enriched in antisense lncRNA loci, some of which are known to play a role in gene or gene cluster regulation, compared to their mouse orthologues. Of the annotated set of 241 human protein-coding homeobox genes, 98 have an antisense locus (41%) while of the 277 orthologous mouse genes, only 62 protein coding gene have an antisense locus (22%), based on publicly available transcriptional evidence.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Humano , Proteínas de Homeodomínio/genética , Anotação de Sequência Molecular/métodos , Família Multigênica , Pseudogenes , Animais , Sequências Hélice-Volta-Hélice , Humanos , Camundongos , RNA Longo não Codificante/genética
2.
Nucleic Acids Res ; 42(Database issue): D865-72, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24217909

RESUMO

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.


Assuntos
Bases de Dados Genéticas , Proteínas/genética , Animais , Éxons , Genômica , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de Sequência
3.
Nucleic Acids Res ; 42(Database issue): D771-9, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24316575

RESUMO

The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).


Assuntos
Bases de Dados Genéticas , Genoma , Anotação de Sequência Molecular , Animais , Genoma Humano , Genômica , Humanos , Internet , Camundongos , Camundongos Endogâmicos NOD , Camundongos Knockout , Ratos , Suínos/genética , Peixe-Zebra/genética
4.
Database (Oxford) ; 2013: bat011, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23589541

RESUMO

Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.


Assuntos
Genoma/genética , Gorilla gorilla/genética , Gorilla gorilla/imunologia , Complexo Principal de Histocompatibilidade/genética , Análise de Sequência de DNA , Animais , Sequência de Bases , Mapeamento Cromossômico , Humanos , Família Multigênica/genética , Pan troglodytes/genética
5.
Genome Biol ; 13(8): R72, 2012 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-22916792

RESUMO

BACKGROUND: The FVB/NJ mouse strain has its origins in a colony of outbred Swiss mice established in 1935 at the National Institutes of Health. Mice derived from this source were selectively bred for sensitivity to histamine diphosphate and the B strain of Friend leukemia virus. This led to the establishment of the FVB/N inbred strain, which was subsequently imported to the Jackson Laboratory and designated FVB/NJ. The FVB/NJ mouse has several distinct characteristics, such as large pronuclear morphology, vigorous reproductive performance, and consistently large litters that make it highly desirable for transgenic strain production and general purpose use. RESULTS: Using next-generation sequencing technology, we have sequenced the genome of FVB/NJ to approximately 50-fold coverage, and have generated a comprehensive catalog of single nucleotide polymorphisms, small insertion/deletion polymorphisms, and structural variants, relative to the reference C57BL/6J genome. We have examined a previously identified quantitative trait locus for atherosclerosis susceptibility on chromosome 10 and identify several previously unknown candidate causal variants. CONCLUSION: The sequencing of the FVB/NJ genome and generation of this catalog has increased the number of known variant sites in FVB/NJ by a factor of four, and will help accelerate the identification of the precise molecular variants that are responsible for phenotypes observed in this widely used strain.


Assuntos
Genoma , Camundongos Endogâmicos C57BL/genética , Camundongos Endogâmicos/genética , Análise de Sequência de DNA/métodos , Animais , Feminino , Camundongos , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Alinhamento de Sequência
6.
Database (Oxford) ; 2010: baq001, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20428316

RESUMO

Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration Conference in Berlin, Germany, April 2009 and hosted the 'Best Practices in Genome Annotation: Inference from Evidence' workshop to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop.

7.
PLoS Genet ; 5(12): e1000759, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20011118

RESUMO

An accurate and precisely annotated genome assembly is a fundamental requirement for functional genomic analysis. Here, the complete DNA sequence and gene annotation of mouse Chromosome 11 was used to test the efficacy of large-scale sequencing for mutation identification. We re-sequenced the 14,000 annotated exons and boundaries from over 900 genes in 41 recessive mutant mouse lines that were isolated in an N-ethyl-N-nitrosourea (ENU) mutation screen targeted to mouse Chromosome 11. Fifty-nine sequence variants were identified in 55 genes from 31 mutant lines. 39% of the lesions lie in coding sequences and create primarily missense mutations. The other 61% lie in noncoding regions, many of them in highly conserved sequences. A lesion in the perinatal lethal line l11Jus13 alters a consensus splice site of nucleoredoxin (Nxn), inserting 10 amino acids into the resulting protein. We conclude that point mutations can be accurately and sensitively recovered by large-scale sequencing, and that conserved noncoding regions should be included for disease mutation identification. Only seven of the candidate genes we report have been previously targeted by mutation in mice or rats, showing that despite ongoing efforts to functionally annotate genes in the mammalian genome, an enormous gap remains between phenotype and function. Our data show that the classical positional mapping approach of disease mutation identification can be extended to large target regions using high-throughput sequencing.


Assuntos
Etilnitrosoureia/farmacologia , Perfilação da Expressão Gênica , Mutação , Proteínas Nucleares/genética , Oxirredutases/genética , Animais , Mapeamento Cromossômico , Éxons , Genes Letais , Camundongos , Camundongos Mutantes
8.
Genome Biol ; 9(5): R91, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18507838

RESUMO

BACKGROUND: The major urinary proteins (MUPs) of Mus musculus domesticus are deposited in urine in large quantities, where they bind and release pheromones and also provide an individual 'recognition signal' via their phenotypic polymorphism. Whilst important information about MUP functionality has been gained in recent years, the gene cluster is poorly studied in terms of structure, genic polymorphism and evolution. RESULTS: We combine targeted sequencing, manual genome annotation and phylogenetic analysis to compare the Mup clusters of C57BL/6J and 129 strains of mice. We describe organizational heterogeneity within both clusters: a central array of cassettes containing Mup genes highly similar at the protein level, flanked by regions containing Mup genes displaying significantly elevated divergence. Observed genomic rearrangements in all regions have likely been mediated by endogenous retroviral elements. Mup loci with coding sequences that differ between the strains are identified--including a gene/pseudogene pair--suggesting that these inbred lineages exhibit variation that exists in wild populations. We have characterized the distinct MUP profiles in the urine of both strains by mass spectrometry. The total MUP phenotype data is reconciled with our genomic sequence data, matching all proteins identified in urine to annotated genes. CONCLUSION: Our observations indicate that the MUP phenotypic polymorphism observed in wild populations results from a combination of Mup gene turnover coupled with currently unidentified mechanisms regulating gene expression patterns. We propose that the structural heterogeneity described within the cluster reflects functional divergence within the Mup gene family.


Assuntos
Camundongos/genética , Proteínas/genética , Animais , Evolução Molecular , Feminino , Masculino , Espectrometria de Massas , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos , Peso Molecular , Proteínas/química , Especificidade da Espécie
9.
PLoS Genet ; 2(4): e23, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16683022

RESUMO

The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo-messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo-messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.


Assuntos
RNA Mensageiro/genética , Transcrição Gênica , Animais , Elementos de DNA Transponíveis , Evolução Molecular , Humanos , Camundongos , Regiões Promotoras Genéticas , Proteínas/genética , Pseudogenes , Reprodutibilidade dos Testes , Alinhamento de Sequência
10.
PLoS Genet ; 2(1): e9, 2006 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-16440057

RESUMO

The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II-related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR-DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.


Assuntos
Evolução Molecular , Haplótipos/genética , Complexo Principal de Histocompatibilidade , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos , Clonagem Molecular , Variação Genética , Antígenos HLA-DR/genética , Humanos , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Análise de Sequência de DNA
11.
Genome Res ; 14(6): 1176-87, 2004 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15140828

RESUMO

The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification.


Assuntos
Doenças Autoimunes/genética , Mapeamento Cromossômico/métodos , Predisposição Genética para Doença/genética , Haplótipos/genética , Complexo Principal de Histocompatibilidade/genética , Linhagem Celular , Mapeamento Cromossômico/estatística & dados numéricos , Cromossomos Artificiais Bacterianos/genética , Consanguinidade , Genes/genética , Variação Genética , Genoma Humano , Antígeno HLA-A1/genética , Antígeno HLA-A3/genética , Antígeno HLA-B8/genética , Antígenos HLA-C/genética , Antígeno HLA-DR3/genética , Humanos , Desequilíbrio de Ligação/genética , Polimorfismo Genético/genética , População Branca/genética
12.
Genome Res ; 13(6B): 1301-6, 2003 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12819127

RESUMO

With the sequencing and annotation of genomes and transcriptomes of several eukaryotes, the importance of noncoding RNA (ncRNA)-RNA molecules that are not translated to protein products-has become more evident. A subclass of ncRNA transcripts are encoded by highly regulated, multi-exon, transcriptional units, are processed like typical protein-coding mRNAs and are increasingly implicated in regulation of many cellular functions in eukaryotes. This study describes the identification of candidate functional ncRNAs from among the RIKEN mouse full-length cDNA collection, which contains 60,770 sequences, by using a systematic computational filtering approach. We initially searched for previously reported ncRNAs and found nine murine ncRNAs and homologs of several previously described nonmouse ncRNAs. Through our computational approach to filter artifact-free clones that lack protein coding potential, we extracted 4280 transcripts as the largest-candidate set. Many clones in the set had EST hits, potential CpG islands surrounding the transcription start sites, and homologies with the human genome. This implies that many candidates are indeed transcribed in a regulated manner. Our results demonstrate that ncRNAs are a major functional subclass of processed transcripts in mammals.


Assuntos
DNA Complementar/genética , Bases de Dados Genéticas , RNA não Traduzido/genética , Animais , Biologia Computacional/métodos , Humanos , Camundongos , Camundongos Endogâmicos C57BL , RNA Complementar/química , RNA Complementar/genética , RNA não Traduzido/química , Ratos , Homologia de Sequência do Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...